KoGraR: Standardized Statistical Analyses of Corpus Counts

نویسندگان

Sascha Wolfer

Sandra Hansen-Morath

Hans-Christian Schmitz

چکیده

Within the project “Corpus grammar” (Korpusgrammatik) at the Institute for the German Language (Institut für Deutsche Sprache, IDS) in Mannheim, techniques and tools are developed for the description of grammatical phenomena based on analyses of very large morphosyntactically annotated corpora. The goal of the project is a corpus-based grammar that captures variations of grammatical structure in presentday German. In the first project phase, pilot studies were conducted (cf. Bubenhofer et al., 2014; Fuß, 2014; Konopka, 2014) to exploit and evaluate various methodological approaches to variation phenomena. For each research question, statistical analyses were chosen and customized. From these analyses, a subset was extracted as the methodological core of the project, with the aim of supporting methodological coherence, interoperability of sub-projects and, finally, the descriptive coherence of the project result, that is, the grammar. The methodological core has been made available to project members via an easy-to-use web front-end: the results of corpus queries and other, user-defined data tables can be uploaded and analyzed automatically. The web front-end is called KoGraR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

Growing Trees from Morphs: Towards Data-Driven Morphological Parsing

We present a quantitative approach to disambiguating flat morphological analyses and producing more deeply structured analyses. Based on existing morphological segmentations, possible combinations of resulting word trees for the next level are filtered first by criteria of linguistic plausibility and then by weighting procedures based on the geometric mean. The frequencies for weighting are der...

متن کامل

The Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks

Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...

متن کامل

Neural Sequence-to-sequence Learning of Internal Word Structure

Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obta...

متن کامل

Human Rights Texts: Converting Human Rights Primary Source Documents into Data

We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, whic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

KoGraR: Standardized Statistical Analyses of Corpus Counts

نویسندگان

چکیده

منابع مشابه

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Growing Trees from Morphs: Towards Data-Driven Morphological Parsing

The Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks

Neural Sequence-to-sequence Learning of Internal Word Structure

Human Rights Texts: Converting Human Rights Primary Source Documents into Data

عنوان ژورنال:

اشتراک گذاری